Search CORE

396 research outputs found

Confounding from Cryptic Relatedness in Case-Control Association Studies

Author: Benjamin F Voight
Goncalo Abecasis
Jonathan K Pritchard
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

Case-control association studies are widely used in the search for genetic variants that contribute to human diseases. It has long been known that such studies may suffer from high rates of false positives if there is unrecognized population structure. It is perhaps less widely appreciated that so-called “cryptic relatedness” (i.e., kinship among the cases or controls that is not known to the investigator) might also potentially inflate the false positive rate. Until now there has been little work to assess how serious this problem is likely to be in practice. In this paper, we develop a formal model of cryptic relatedness, and study its impact on association studies. We provide simple expressions that predict the extent of confounding due to cryptic relatedness. Surprisingly, these expressions are functions of directly observable parameters. Our analytical results show that, for well-designed studies in outbred populations, the degree of confounding due to cryptic relatedness will usually be negligible. However, in contrast, studies where there is a sampling bias toward collecting relatives may indeed suffer from excessive rates of false positives. Furthermore, cryptic relatedness may be a serious concern in founder populations that have grown rapidly and recently from a small size. As an example, we analyze the impact of excess relatedness among cases for six phenotypes measured in the Hutterite population

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes

Author: Abecasis Goncalo R.
Ding Jun
Li Yun
Scheet Paul
Willer Cristen J.
Publication venue
Publication date: 01/01/2010
Field of study

Genome‐wide association studies (GWAS) can identify common alleles that contribute to complex disease susceptibility. Despite the large number of SNPs assessed in each study, the effects of most common SNPs must be evaluated indirectly using either genotyped markers or haplotypes thereof as proxies. We have previously implemented a computationally efficient Markov Chain framework for genotype imputation and haplotyping in the freely available MaCH software package. The approach describes sampled chromosomes as mosaics of each other and uses available genotype and shotgun sequence data to estimate unobserved genotypes and haplotypes, together with useful measures of the quality of these estimates. Our approach is already widely used to facilitate comparison of results across studies as well as meta‐analyses of GWAS. Here, we use simulations and experimental genotypes to evaluate its accuracy and utility, considering choices of genotyping panels, reference panel configurations, and designs where genotyping is replaced with shotgun sequencing. Importantly, we show that genotype imputation not only facilitates cross study analyses but also increases power of genetic association studies. We show that genotype imputation of common variants using HapMap haplotypes as a reference is very accurate using either genome‐wide SNP data or smaller amounts of data typical in fine‐mapping studies. Furthermore, we show the approach is applicable in a variety of populations. Finally, we illustrate how association analyses of unobserved variants will benefit from ongoing advances such as larger HapMap reference panels and whole genome shotgun sequencing technologies

PubMed Central

Carolina Digital Repository

Deep Blue Documents at the University of Michigan

Genetic association study of age‐related macular degeneration in the Spanish population

Author: Abecasis Goncalo
Brión María
Carracedo Angel
Cortón Marta
de la Fuente Maria
Othman Mohammad
Pazos Belen
Sanchez‐salorio Manuel
Sobrino Beatriz
Swaroop Anand
Publication venue: 'Wiley'
Publication date: 01/02/2011
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/110883/1/j.1755-3768.2010.02040.x.pd

Deep Blue Documents at the University of Michigan

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Author: Abecasis Goncalo R.
Cupples L. Adrienne
Hernandez Ryan D.
Jaquish Cashell E.
Laurie Cathy C.
McManus David D.
O\u27Connor Timothy D.
Taliun Daniel
Publication venue: eScholarship@UMassChan
Publication date: 10/02/2021
Field of study

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)(1). In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%

eScholarship@UMMS

Whole Genome Sequencing in Psychiatric Disorders: the WGSPD Consortium

Author: Abecasis Goncalo
An Joon-Yong
Arguello P. Alexander
Blangero John
Boehnke Michael
Dong Shan
Huang Hailiang
Neale Benjamin M.
Sanders Stephan J.
Werling Donna M.
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/12/2017
Field of study

As technology advances, whole genome sequencing (WGS) is likely to supersede other genotyping technologies. The rate of this change depends on its relative cost and utility. Variants identified uniquely through WGS may reveal novel biological pathways underlying complex disorders and provide high-resolution insight into when, where, and in which cell type these pathways are affected. Alternatively, cheaper and less computationally intensive approaches may yield equivalent insights. Understanding the role of rare variants in the noncoding gene-regulating genome, through pilot WGS projects, will be critical to determine which of these two extremes best represents reality. With large cohorts, well-defined risk loci, and a compelling need to understand the underlying biology, psychiatric disorders have a role to play in this preliminary WGS assessment. The WGSPD consortium will integrate data for 18,000 individuals with psychiatric disorders, beginning with autism spectrum disorder, schizophrenia, bipolar disorder, and major depressive disorder, along with over 150,000 controls

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Recommended from our members

Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants

Author: Abecasis Goncalo
Akey Joshua M.
Altshuler David
Bamshad Michael J.
Fu Wenqing
Gabriel Stacey
GO Broad
GO Seattle
Jun Goo
Kang Hyun Min
Leal Suzanne M.
Nickerson Deborah A.
O'Connor Timothy D.
Shendure Jay
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/02/2014
Field of study

Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history1,2 and will help facilitate the development of new approaches for disease gene discovery3. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth4-6, notable for an excess of rare genetic variants, qualitatively suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European (n=4,298) and African (n=2,217) American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that ~73% of all protein-coding SNVs and ~86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs compared to other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, illustrate the profound effect recent human history has had on the burden of deleterious SNVs segregating in contemporary populations, and provides important practical information that can be used to prioritize variants in disease gene discovery

Harvard University - DASH

Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels

Author: Abecasis Goncalo R.
Chen Jin
Das Sayantan
Elvestad Maiken B.
Fritsche Lars G.
Holmen Oddgeir L.
Hveem Kristian
Kang Hyun Min
Lin Maoxuan
Nielsen Jonas B.
Willer Cristen J.
Zhang He
Zhou Wei
Publication venue: 'Wiley'
Publication date: 01/12/2017
Field of study

The accuracy of genotype imputation depends upon two factors: the sample size of the reference panel and the genetic similarity between the reference panel and the target samples. When multiple reference panels are not consented to combine together, it is unclear how to combine the imputation results to optimize the power of genetic association studies. We compared the accuracy of 9,265 Norwegian genomes imputed from three reference panels—1000 Genomes phase 3 (1000G), Haplotype Reference Consortium (HRC), and a reference panel containing 2,201 Norwegian participants from the population‐based Nord Trøndelag Health Study (HUNT) from low‐pass genome sequencing. We observed that the population‐matched reference panel allowed for imputation of more population‐specific variants with lower frequency (minor allele frequency (MAF) between 0.05% and 0.5%). The overall imputation accuracy from the population‐specific panel was substantially higher than 1000G and was comparable with HRC, despite HRC being 15‐fold larger. These results recapitulate the value of population‐specific reference panels for genotype imputation. We also evaluated different strategies to utilize multiple sets of imputed genotypes to increase the power of association studies. We observed that testing association for all variants imputed from any panel results in higher power to detect association than the alternative strategy of including only one version of each genetic variant, selected for having the highest imputation quality metric. This was particularly true for lower frequency variants (MAF < 1%), even after adjusting for the additional multiple testing burden.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/139954/1/gepi22067_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/139954/2/gepi22067.pd

Deep Blue Documents at the University of Michigan

Analysis of long non-coding RNAs highlights tissue-specific expression patterns and epigenetic profiles in normal and psoriatic skin

Author: Abecasis Goncalo R
Chinnaiyan Arul M
Ding Jun
Elder James T
Gudjonsson Johann E
Iyer Matthew K
Kang Hyun M
Li Bingshan
Nair Rajan P
Sarkar Mrinal K
Stuart Philip E
Swindell William R
Tejasvi Trilokraj
Tsoi Lam C
Voorhees John J
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Abstract Background Although analysis pipelines have been developed to use RNA-seq to identify long non-coding RNAs (lncRNAs), inference of their biological and pathological relevance remains a challenge. As a result, most transcriptome studies of autoimmune disease have only assessed protein-coding transcripts. Results We used RNA-seq data from 99 lesional psoriatic, 27 uninvolved psoriatic, and 90 normal skin biopsies, and applied computational approaches to identify and characterize expressed lncRNAs. We detect 2,942 previously annotated and 1,080 novel lncRNAs which are expected to be skin specific. Notably, over 40% of the novel lncRNAs are differentially expressed and the proportions of differentially expressed transcripts among protein-coding mRNAs and previously-annotated lncRNAs are lower in psoriasis lesions versus uninvolved or normal skin. We find that many lncRNAs, in particular those that are differentially expressed, are co-expressed with genes involved in immune related functions, and that novel lncRNAs are enriched for localization in the epidermal differentiation complex. We also identify distinct tissue-specific expression patterns and epigenetic profiles for novel lncRNAs, some of which are shown to be regulated by cytokine treatment in cultured human keratinocytes. Conclusions Together, our results implicate many lncRNAs in the immunopathogenesis of psoriasis, and our results provide a resource for lncRNA studies in other autoimmune diseases.http://deepblue.lib.umich.edu/bitstream/2027.42/110307/1/13059_2014_Article_570.pd

Springer - Publisher Connector

PubMed Central

Deep Blue Documents at the University of Michigan

Independent test assessment using the extreme value distribution theory

Author: Abecasis Goncalo
Almeida Marcio
Blangero John
Blondell Lucy
Cingolani Pablo E
Duggirala Ravindranath
Dyer Thomas D
Frayling Timothy M
Fuchsberger Christian
Jun Goo
Kent Jack W
Manning Alisa K
Peralta Juan M
Sladek Robert
Teslovich Tanya M
Wood Andrew R
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/10/2016
Field of study

Abstract The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.http://deepblue.lib.umich.edu/bitstream/2027.42/134747/1/12919_2016_Article_38.pd

PubMed Central

Deep Blue Documents at the University of Michigan

Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees

Author: Andrew R Wood
Christian Fuchsberger
Donna Lehman
Goncalo Abecasis
Goo Jun
Jack W Kent
Joanne E Curran
John Blangero
Juan M Peralta
Laura Almasy
Marcio A Almeida
null null
Ravindranath Duggirala
Satish Kumar
Sharon Fowler
Sobha Puppala
Thomas D Dyer
Tom W Blackwell
Publication venue: Springer Nature
Publication date: 17/06/2014
Field of study

Genetic Analysis Workshop 18 (GAW18) focused on identification of genes and functional variants that influence complex phenotypes in human sequence data. Data for the workshop were donated by the T2D-GENES Consortium and included whole genome sequences for odd-numbered autosomes in 464 key individuals selected from 20 Mexican American families, a dense set of single-nucleotide polymorphisms in 959 individuals in these families, and longitudinal data on systolic and diastolic blood pressure measured at 1-4 examinations over a period of 20 years. Simulated phenotypes were generated based on the real sequence data and pedigree structures. In the design of the simulation model, gene expression measures from the San Antonio Family Heart Study (not distributed as part of the GAW18 data) were used to identify genes whose mRNA levels were correlated with blood pressure. Observed variants within these genes were designated as functional in the GAW18 simulation if they were nonsynonymous and predicted to have deleterious effects on protein function or if they were noncoding and associated with mRNA levels. Two simulated longitudinal phenotypes were modeled to have the same trait distributions as the real systolic and diastolic blood pressure data, with effects of age, sex, and medication use, including a genotype-medication interaction. For each phenotype, more than 1000 sequence variants in more than 200 genes present on the odd-numbered autosomes individually explained less than 0.01-2.78% of phenotypic variance. Cumulatively, variants in the most influential gene explained 7.79% of trait variance. An additional simulated phenotype, Q1, was designed to be correlated among family members but to not be associated with any sequence variants. Two hundred replicates of the phenotypes were simulated, with each including data for 849 individuals

Springer - Publisher Connector

PubMed Central

Deep Blue Documents at the University of Michigan